The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Recently, segmentation-based methods are quite popular in scene text detection, which mainly contain two steps: text kernel segmentation and expansion. However, the segmentation process only considers each pixel independently, and the expansion process is difficult to achieve a favorable accuracy-speed trade-off. In this paper, we propose a Context-aware and Boundary-guided Network (CBN) to tackle these problems. In CBN, a basic text detector is firstly used to predict initial segmentation results. Then, we propose a context-aware module to enhance text kernel feature representations, which considers both global and local contexts. Finally, we introduce a boundary-guided module to expand enhanced text kernels adaptively with only the pixels on the contours, which not only obtains accurate text boundaries but also keeps high speed, especially on high-resolution output maps. In particular, with a lightweight backbone, the basic detector equipped with our proposed CBN achieves state-of-the-art results on several popular benchmarks, and our proposed CBN can be plugged into several segmentation-based methods. Code will be available on https://github.com/XiiZhao/cbn.pytorch.
translated by 谷歌翻译
Point cloud registration is a popular topic which has been widely used in 3D model reconstruction, location, and retrieval. In this paper, we propose a new registration method, KSS-ICP, to address the rigid registration task in Kendall shape space (KSS) with Iterative Closest Point (ICP). The KSS is a quotient space that removes influences of translations, scales, and rotations for shape feature-based analysis. Such influences can be concluded as the similarity transformations that do not change the shape feature. The point cloud representation in KSS is invariant to similarity transformations. We utilize such property to design the KSS-ICP for point cloud registration. To tackle the difficulty to achieve the KSS representation in general, the proposed KSS-ICP formulates a practical solution that does not require complex feature analysis, data training, and optimization. With a simple implementation, KSS-ICP achieves more accurate registration from point clouds. It is robust to similarity transformation, non-uniform density, noise, and defective parts. Experiments show that KSS-ICP has better performance than the state of the art.
translated by 谷歌翻译
基于卷积的方法在医疗图像分割任务中提供了良好的分割性能。但是,这些方法在处理医学图像的边缘时面临以下挑战:(1)以前的基于卷积的方法不关注分割边缘周围前景和背景之间的边界关系,从而导致分割性能的退化当边缘变化时。 (2)卷积层的电感偏置不能适应复杂的边缘变化和多分段区域的聚合,从而导致其性能改善大部分仅限于分割分段区域而不是边缘的范围。为了应对这些挑战,我们提出了MFI(多尺度特征交互)块和英亩(轴向上下文关系编码器)块上的CM-MLP框架,以精确分割医疗图像的边缘。在MFI块中,我们建议级联多尺度MLP(Cascade MLP)同时从网络的较深层中处理所有局部信息,并利用CASCADE多尺度机制逐渐融合离散的本地信息。然后,英亩块用于使深度监督着眼于探索前景和背景之间的边界关系以修改医疗图像的边缘。我们提议的CM-MLP框架的分割准确性(DICE)达到96.96%,96.76%和82.54%的三个基准数据集:CVC-ClinicDB数据集,Sub-Kvasir Dataset和我们的内部数据集,这些数据集分别超过了。最先进的方法。源代码和训练有素的模型将在https://github.com/programmerhyy/cm-mlp上找到。
translated by 谷歌翻译
在本文中,我们提出了一种用于链接预测任务的路径感知暹罗图神经网络(PSG)的算法。首先,PSG可以捕获给定两个节点的节点和边缘特征,即k-邻晶的结构信息和节点的继电器路径信息。此外,PSG利用暹罗图神经网络来表示两个对比链接,这是一个积极的联系和负面的联系。我们在OGBL-DDI的Open Graph Benchmark(OGB)的链接属性预测数据集上评估了所提出的算法PSG。PSG在OGBL-DDI上取得了前1位的表现。实验结果验证了PSG的优势。
translated by 谷歌翻译
图像的美学评估可以分为两种主要形式:数值评估和语言评估。照片的美学标题是已解决的审美语言评估的唯一任务。在本文中,我们提出了一项美学评估的新任务:图像的美学视觉和回答(AVQA)。如果我们提出图像美学问题,模型可以预测答案。我们使用\ textit {www.flickr.com}的图像。目标QA对由提出的美学属性分析算法产生。此外,我们引入了主观质量检查对,这些对从审美数字标签和来自大规模培训模型的情感分析转换。我们构建了第一个回答数据集AESVQA的审美视觉问题,其中包含72,168个高质量图像和324,756对美学问题。已经提出并证明了两种调整数据分布的方法,以提高现有模型的准确性。这是解决美学VQA任务并将主观性引入VQA任务的第一项工作。实验结果表明,我们的方法在这项新任务上的表现优于其他VQA模型。
translated by 谷歌翻译
最近,先驱研究工作提出了大量的声学特征(原木功率谱图,线性频率卷轴系数,恒定的q cepstral系数等),以进行音频深层检测,获得良好的性能,并表明不同的子带对音频有不同的贡献DeepFake检测。但是,这缺乏对子带中特定信息的解释,这些功能也丢失了诸如阶段之类的信息。受合成语音机制的启发,基本频率(F0)信息用于提高综合语音的质量,而合成语音的F0仍然太平均,这与真实语音的F0差异很大。可以预期,F0可以用作重要信息来区分真正的语言和虚假语音,而由于F0的分布不规则,因此不能直接使用此信息。相反,选择了大多数F0的频带作为输入特征。同时,为了充分利用相位和全频段信息,我们还建议使用真实和虚构的频谱图作为互补输入功能,并分别对Discoint子带进行建模。最后,融合了F0的结果,真实和假想的频谱图。 ASVSPOOF 2019 LA数据集的实验结果表明,我们所提出的系统对于音频DeepFake检测任务非常有效,达到等效错误率(EER)为0.43%,几乎超过了所有系统。
translated by 谷歌翻译
传统的多播路由方法在构建多播树时存在一些问题,例如对网络状态信息的访问有限,对网络的动态和复杂变化的适应性不佳以及不灵活的数据转发。为了解决这些缺陷,软件定义网络(SDN)中的最佳多播路由问题是根据多目标优化问题量身定制的,以及基于深Q网络(DQN)深度强化学习(DQN)的智能多播路由算法DRL-M4MR( DRL)方法旨在构建SDN中的多播树。首先,通过组合SDN的全局视图和控制,将多播树状态矩阵,链路带宽矩阵,链路延迟矩阵和链路延迟损耗矩阵设计为DRL代理的状态空间。其次,代理的动作空间是网络中的所有链接,而动作选择策略旨在将链接添加到四种情况下的当前多播树。第三,单步和最终奖励功能表格旨在指导智能以做出决定以构建最佳多播树。实验结果表明,与现有算法相比,DRL-M4MR的多播树结构可以在训练后获得更好的带宽,延迟和数据包损耗率,并且可以在动态网络环境中做出更智能的多播路由决策。
translated by 谷歌翻译
近年来,场景文本检测和识别的研究重点已转移到任意形状文本,文本形状表示是一个基本问题。理想的表示应紧凑,完整,高效和可重复使用,以便我们认为后续认可。但是,以前的表示在一个或多个方面存在缺陷。薄板间隙(TPS)转换在场景文本识别方面取得了巨大成功。受到这一点的启发,我们逆转了它的用法,并精致地将TPS视为任意形状文本表示的精美表示。 TPS表示是紧凑,完整和有效的。使用预测的TPS参数,可以将检测到的文本区域直接纠正到近冬季的参数,以帮助后续识别。为了进一步利用TPS表示的潜力,提出了边界对准损失。基于这些设计,我们实现了文本检测器tpsnet,可以方便地将其扩展到文本次数。对几个公共基准的广泛评估和消融表明,提出的文本表示和斑点方法的有效性和优势。特别是,TPSNET在ART数据集上实现了4.4 \%(78.4 \%vs. 74.0 \%)的检测F量改进,并且在5.0 \%(78.5 \%vs. 73.55)上进行了端到端的斑点f-Measure改进。 \%)在总文本上,这是没有铃铛和口哨的大边缘。
translated by 谷歌翻译
传统的工业推荐人通常在单一的业务领域培训,然后为此域名服务。但是,在大型商业平台中,通常情况下,推荐人需要为多个业务域提供点击率(CTR)预测。不同的域具有重叠的用户组和项目。因此,存在共性。由于特定用户组具有差异,并且用户行为可能在各种商业域中改变,因此还存在区别。区别导致特定于域的数据分布,使单个共享模型很难在所有域上运行良好。要学习一个有效且高效的CTR模型,可以同时处理多个域,我们呈现明星拓扑自适应推荐(Star)。具体而言,STAR具有星形拓扑,由共享中心参数和特定于域的参数组成。共享参数用于学习所有域的共性,以及域特定参数捕获域区分以进行更精细的预测。给定来自不同商业域的请求,Star可以根据域特征调节其参数。生产数据的实验结果验证了所提出的明星模型的优越性。自2020年以来,STAR已部署在阿里巴巴的显示广告系统中,从RPM获得平均8.0%的改进和6.0%(每米尔勒收入)。
translated by 谷歌翻译